The Xavier Musketeers are putting on a good show in the tough big east conference this year! Part of what makes this team special to watch is the depth of the roster. From leading score Souley Boum to 3pt sniper Adam Kunkel, it seems like any person on the roster is liable to go 20. Coach Sean Miller has done a phenomenal job getting the ball to the players in situations where they have a strong advantage.
But this go me thinking about ways to measure the distribution of points within a basketball team and whether or not “sharing the rock” had any correlation to other performance metrics. In Economics their is a concept called the Gini Index which is used to measure the income inequality of a given nation. The graph that this concept is charted on is called a Lorenz Curve named after American economist Max O. Lorenz. While classically the Lorenz curve and the gini index are used to measure income inequality, I thought that I would have some fun by applying the concepts to basketball.
The Lorenz curve is beautiful in its simplicity but can give us insights into how productivity is distributed. Along the Y-axis the the cumulative sum of the percentage of income each person is responsible for. Along the X-axis the cumulative sum of each person as a percentage of the total population. In plain English. we plot points on the graph based on their contribution to the population and their contribution to total income. when we connect the dots, we have made ourselves a Lorenz curve. This curve is then compared to a 45 degree line that cuts the graph diagonally with a slope of 1 to represent perfect quality.
Now, this where the mathematics of it gets funky. Im going to put a trigger warning in here cause were about to talk calculus.
if you take the area in between the perfect equality line and the Lorenz curve and divide it by the total area underneath the equality line we have a number between 0 and 1 which is called the gini Coefficient. The gini coefficient is often multiplied by a 100 to give the gini index which is number between 0 and 100. Most people just use the terms interchangeable in casual conversation.
a 1.00 gini coefficient means perfect inequality
a 0.00 gini coefficient means perfect equality.
One simple trick for remembering which one is which is that 1 means one person has everything and 0 means that no one has anymore.
While this was a fun exercise, I could find no correlations between a teams scoring gini index and any other measure of performance. All of the correlation graphs look like a sneeze :(.
I think the take away here is that teams take different strategies based on personnel which are equally effective. While some teams like Xavier have a full roster of real tough guys ready to #zip-em-up, other schools rely on one main scorer to get the job done. Check out the graphs below to see where your favorite team falls on the spectrum.
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
##
## Attaching package: 'rvest'
##
##
## The following object is masked from 'package:readr':
##
## guess_encoding
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## schdata.School schdata.gini schdata.pts
## 1 Seton Hall 0.2524670 1439
## 2 Notre Dame 0.2663559 1475
## 3 Tennessee 0.2730346 1479
## 4 Boston College 0.2749542 1388
## 5 Oklahoma State 0.2811487 1358
## 6 Georgia Tech 0.2871251 1367
## 7 Duke 0.2922915 1427
## 8 Iowa State 0.3199851 1341
## 9 Colorado 0.3230380 1580
## 10 Florida 0.3266650 1436
## 11 Virginia 0.3266878 1259
## 12 Indiana 0.3325346 1565
## 13 Providence 0.3429003 1655
## 14 Mississippi 0.3431339 1350
## 15 Baylor 0.3443604 1587
## 16 Texas 0.3449680 1607
## 17 Marquette 0.3467986 1718
## 18 Connecticut 0.3470520 1730
## 19 Illinois 0.3479524 1514
## 20 California 0.3504241 1211
## 21 Stanford 0.3533382 1358
## 22 Oklahoma 0.3536330 1335
## 23 Georgia 0.3536877 1392
## 24 St. John's (NY) 0.3603776 1618
## 25 TCU 0.3610392 1546
## 26 Washington 0.3670962 1524
## 27 Minnesota 0.3710660 1182
## 28 Arizona State 0.3760486 1510
## 29 Vanderbilt 0.3800716 1439
## 30 Pittsburgh 0.3815385 1560
## 31 Missouri 0.3840558 1661
## 32 Texas Tech 0.3840751 1483
## 33 West Virginia 0.3850142 1527
## 34 Oregon 0.3855798 1469
## 35 Georgetown 0.3858239 1516
## 36 Kentucky 0.3884243 1511
## 37 Purdue 0.3900130 1542
## 38 Rutgers 0.3902279 1388
## 39 Kansas State 0.3925450 1556
## 40 Auburn 0.3943597 1442
## 41 Nebraska 0.3973988 1384
## 42 Mississippi State 0.3979026 1291
## 43 Clemson 0.4016561 1559
## 44 Xavier 0.4037801 1746
## 45 Creighton 0.4058568 1546
## 46 Virginia Tech 0.4081787 1455
## 47 Washington State 0.4107110 1487
## 48 Alabama 0.4107143 1652
## 49 South Carolina 0.4139021 1266
## 50 Ohio State 0.4178000 1524
## 51 Michigan State 0.4179067 1454
## 52 Texas A&M 0.4225067 1484
## 53 Northwestern 0.4246951 1312
## 54 Villanova 0.4250354 1414
## 55 Iowa 0.4312811 1608
## 56 Maryland 0.4358213 1408
## 57 Florida State 0.4382381 1455
## 58 DePaul 0.4423880 1503
## 59 Arizona 0.4465616 1708
## 60 Arkansas 0.4481873 1498
## 61 Wake Forest 0.4489268 1618
## 62 Oregon State 0.4496503 1320
## 63 Butler 0.4502454 1494
## 64 Miami (FL) 0.4525071 1561
## 65 Louisiana State 0.4526316 1330
## 66 Syracuse 0.4531646 1580
## 67 UCLA 0.4538479 1564
## 68 NC State 0.4692400 1658
## 69 Penn State 0.4733653 1458
## 70 Louisville 0.4849333 1250
## 71 Michigan 0.4866102 1475
## 72 Wisconsin 0.4868914 1246
## 73 Southern California 0.4918889 1500
## 74 Kansas 0.4944242 1500
## 75 North Carolina 0.5031684 1657
## 76 Utah 0.5207197 1572
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'